
A Synthetic Data Community for Sensitive Research Data in the UK
Welcome to the UK Synthetic Data Community Group, a collaborative network of researchers, public representatives, and data owners dedicated to advancing the responsible use of synthetic data. Our mission is to develop robust governance frameworks and open-source tools that facilitate the utilisation of synthetic data in sensitive data research. Together, we strive to ensure that this innovative approach enhances research possibilities while prioritising ethical considerations and data privacy. Join in fostering a vibrant community focused on harnessing the power of synthetic data.

Advancing the Utilisation of Synthetic Data
The UK SDCG serves as a collaborative network of researchers, data owners and public representatives to advance the utlilisation of synthetic data in secure environments.

Governance
Developing governance frameworks to enable the utilisation of synthetic data in secure environments.

Tools
Creating open-source tools for the research data community to generate and evaluate synthetic data.

Standards
Setting standards for the evaluation and reporting of synthetic data.

Resources
Publishing learning resources to help researchers, data owners and the public understand more about synthetic data.
What is Synthetic Data?
Synthetic data is artificially generated data which often aims to mimic the properties of real life datasets, while protecting privacy. This is useful for a number of purposes, especially in sensitive data which often has strict governance and access controls.
With synthetic data, we can enable researcher training, algorithm development, federation of data environments and data discovery without the strict, lengthy approval processes typically associated with data stored in secure environments.

Workshops for Developing Governance & Standards in Synthetic Data
Synthetic data holds transformative potential for sensitive data, enabling innovation while safeguarding patient privacy. However, realising this potential requires robust governance and standardised frameworks to ensure safe and ethical use. This group aims to host several workshops with different stakeholders to establish clear protocols for data generation, evaluation, and deployment in secure environments. This is essential to build trust and unlock synthetic data’s full value in sensitive data applications.

Community Tools for the Generation & Evaluation of Synthetic Data
We are developing open-source tools for the generation and evaluation of synthetic data at various levels of fidelity. The SynthOpt package is built for the Trusted Research Environment (TRE) community to generate low-fidelity synthetic data using statistical techniques, or generate higher levels of fidelity with generative AI. Plus, the package comes with several tools for evaluation, visualisation and reporting of synthetic data to ensure it can be implemented in practice.

Use Generative AI to create highly realistic synthetic data which is very similar to the real data.

Create transparency reports to ensure generation processes and evaluations are transparent.

Create structurally similar synthetic data from metadata instead of using the data itself.

Generate varying levels of synthetic data for different purposes and use cases.

Ensure privacy is protected with differentially private techniques and comprehensive evaluations.
SynthOpt Framework
A Python package to generate, evaluate and optimise synthetic data at various levels of fidelity.

Generate
Generate synthetic data using either statistical or machine learning methods. This allows the generation of structural, statistical, correlated and augmented synthetic data.
Evaluate
Includes metrics for evaluating privacy, utility and quality of the synthetic data, as well as a method to create automatic reports for transparency and evaluation.
Optimise
Optional optimisation for differentially private machine learning methods to find optimial values of the privacy parameter in respect to privacy/utility weighting.
The DARE UK Synthetic Data
Community Group Co-Chairs

Lewis Hotchkiss
Dementias Platform UK

Simon Thompson
Dementias Platform UK,
SeRP UK, SAIL Databank

Emma Squires
Dementias Platform UK

John Gallacher
Dementias Platform UK,
University of Oxford

Timothy Rittman
University of Cambridge

Anmol Arora
University College London

Emily Oliver
Administrative Data Research UK

Sophie McCall
Research Data Scotland

Fiona Lugg-Widger
Centre for Trials Research,
Cardiff University

Robert Trubey
Centre for Trials Research, Cardiff University

Cristina Magder
UK Data Service

Steve Moore
Public Representative
